深度学习模型推断是许多企业和科学发现过程中的关键服务。本文介绍了Ribbon,这是一种新颖的深度学习推理服务系统,符合两个相互竞争的目标:服务质量(QoS)目标和成本效益。功能区背后的关键思想是智能采用各种云计算实例(异质实例)来满足QoS目标并最大程度地节省成本。功能区设计了一种贝叶斯优化驱动的策略,该策略可帮助用户在云计算平台上为其模型推理服务需求构建最佳的异质实例集 - 并且,功能区展示了其优于使用均匀实例池的推理服务系统的优越性。功能区可为不同的学习模型节省多达16%的推理服务成本,包括新兴的深度学习建议系统模型和药物发现的启用模型。
translated by 谷歌翻译
意见摘要是创建摘要的任务,以获取用户评论中的流行意见。在本文中,我们介绍了Geodesic Summarizer(GeoSumm),这是一种新型系统,可执行无监督的提取意见摘要。 GeoSumm涉及基于编码器的表示模型,该模型将文本表示为潜在语义单元的分布。 GeoSumm通过在多个解码器层上对预训练的文本表示进行字典学习来生成这些表示。然后,我们使用这些表示形式使用新型的基于测量距离的评分机制来量化审查句子的相关性。我们使用相关得分来确定流行意见,以构成一般和特定方面的摘要。我们提出的模型GeoSumm在三个意见摘要数据集上实现了最先进的性能。我们执行其他实验来分析模型的功能,并展示跨不同域{\ x}的概括能力。
translated by 谷歌翻译
机器学习系统通常被部署用于做出关键决策,例如信用贷款,招聘等。在做出决策时,此类系统通常会在其中间表示中对用户的人口统计信息(例如性别,年龄)进行编码。这可能会导致对特定人口统计的决定。先前的工作集中在中间表示方面,以确保公正的决策。但是,随着任务或人口统计分布的变化,这些方法无法保持公平。为了确保野外的公平性,对于系统来说,适应以渐进方式访问新数据的更改很重要。在这项工作中,我们建议通过在渐进学习环境中介绍学习公平表示的问题来解决此问题。为此,我们介绍了公平意识的增量表示学习(FAIRL),这是一种代表学习系统,可以维持公平,同时逐步学习新任务。 Fairl能够通过控制学习表示的速度延伸功能来实现公平和学习新任务。我们的经验评估表明,Fairl能够在目标任务上实现高性能的同时做出公正的决定,表现优于几个基线。
translated by 谷歌翻译
变压器模型最近已成为自然语言处理中的基础模型之一,作为副产品,最近对扩展这些模型具有重大的兴趣和投资。但是,这些大型变压器语言模型的培训和推理成本令人难以置信,因此需要更多的研究来识别更有效的变体。在这项工作中,我们通过用统计语言建模中的文献启发的变压器体系结构提出了一个简单而有效的修改,该架构是通过通过文本序列的离散潜在表示构建的n-grams来增强模型的。我们评估了我们的模型,关于C4数据集的语言建模的N-Strammer以及Superglue数据集的文本分类,并发现它的表现优于诸如变压器和底漆等几个强基线。我们为JAX中的可重复性目的开放源模型。
translated by 谷歌翻译
通过机器学习模型学到的文本表示通常编码用户的不良人口统计信息。基于这些表示形式的预测模型可以依靠此类信息,从而产生偏见的决策。我们提出了一种新颖的偏见技术,即公平意识的速率最大化(农场),该技术使用速率依赖函数来消除受保护的信息,以表示属于相同受保护的属性类别的实例不相关。Farm能够在有或没有目标任务的情况下进行辩论式表示。还可以适应农场同时删除有关多个受保护属性的信息。经验评估表明,Farm在几个数据集上实现了最新的性能,并且学会的表示形式泄漏了受保护的属性信息明显减少,以防止非线性探测网络攻击。
translated by 谷歌翻译
自然语言处理(NLP)技术可以使用人的话语来帮助诊断诸如抑郁症之类的医疗状况。抑郁症是一种严重的医学疾病,可能会对人们的感觉,思维和行为产生不利影响,这可能导致情绪和身体上的问题。由于此类数据的敏感性,需要采取隐私措施来使用此类数据处理和培训模型。在这项工作中,我们研究了差异隐私(DP)在集中式学习和联合学习(FL)设置中对培训上下文化语言模型(Bert,Albert,Roberta和Distilbert)的影响。我们提供有关如何私下培训NLP模型以及哪些架构和设置提供更理想的隐私公用事业权衡的见解。我们设想这项工作将用于未来的医疗保健和心理健康研究,以使病史保持私密。因此,我们提供了这项工作的开源实施。
translated by 谷歌翻译
我们考虑涉及大量突出点的突出指向的应用程序。通过理论和实证分析,我们开发了一种引导的直觉,以表明,当这些实例遵循某些结构时,大多数投影都位于多粒子的顶点上。为了有效地进行这些预测,我们推出了一个面向顶点的增量算法,将点投影到任何任意多托,以及给出特定算法,以迎合单位投影,并通过平面切割单位盒的多台零件。这种设置在Web级应用中特别有用,例如最佳匹配或分配问题。互联网市场(电子商务,乘车共享,食品交付,专业服务,广告等)中的几个问题可以配制为线性程序(LP),其中多种子约束需要整体优化过程中的投影步骤。我们表明,在最近的工作中,多体化投影是最昂贵的步骤,我们有效的投影算法有助于获得性能的大量改进。
translated by 谷歌翻译
Embedding words in vector space is a fundamental first step in state-of-the-art natural language processing (NLP). Typical NLP solutions employ pre-defined vector representations to improve generalization by co-locating similar words in vector space. For instance, Word2Vec is a self-supervised predictive model that captures the context of words using a neural network. Similarly, GLoVe is a popular unsupervised model incorporating corpus-wide word co-occurrence statistics. Such word embedding has significantly boosted important NLP tasks, including sentiment analysis, document classification, and machine translation. However, the embeddings are dense floating-point vectors, making them expensive to compute and difficult to interpret. In this paper, we instead propose to represent the semantics of words with a few defining words that are related using propositional logic. To produce such logical embeddings, we introduce a Tsetlin Machine-based autoencoder that learns logical clauses self-supervised. The clauses consist of contextual words like "black," "cup," and "hot" to define other words like "coffee," thus being human-understandable. We evaluate our embedding approach on several intrinsic and extrinsic benchmarks, outperforming GLoVe on six classification tasks. Furthermore, we investigate the interpretability of our embedding using the logical representations acquired during training. We also visualize word clusters in vector space, demonstrating how our logical embedding co-locate similar words.
translated by 谷歌翻译
Drug dosing is an important application of AI, which can be formulated as a Reinforcement Learning (RL) problem. In this paper, we identify two major challenges of using RL for drug dosing: delayed and prolonged effects of administering medications, which break the Markov assumption of the RL framework. We focus on prolongedness and define PAE-POMDP (Prolonged Action Effect-Partially Observable Markov Decision Process), a subclass of POMDPs in which the Markov assumption does not hold specifically due to prolonged effects of actions. Motivated by the pharmacology literature, we propose a simple and effective approach to converting drug dosing PAE-POMDPs into MDPs, enabling the use of the existing RL algorithms to solve such problems. We validate the proposed approach on a toy task, and a challenging glucose control task, for which we devise a clinically-inspired reward function. Our results demonstrate that: (1) the proposed method to restore the Markov assumption leads to significant improvements over a vanilla baseline; (2) the approach is competitive with recurrent policies which may inherently capture the prolonged effect of actions; (3) it is remarkably more time and memory efficient than the recurrent baseline and hence more suitable for real-time dosing control systems; and (4) it exhibits favorable qualitative behavior in our policy analysis.
translated by 谷歌翻译
Unsupervised learning-based anomaly detection in latent space has gained importance since discriminating anomalies from normal data becomes difficult in high-dimensional space. Both density estimation and distance-based methods to detect anomalies in latent space have been explored in the past. These methods prove that retaining valuable properties of input data in latent space helps in the better reconstruction of test data. Moreover, real-world sensor data is skewed and non-Gaussian in nature, making mean-based estimators unreliable for skewed data. Again, anomaly detection methods based on reconstruction error rely on Euclidean distance, which does not consider useful correlation information in the feature space and also fails to accurately reconstruct the data when it deviates from the training distribution. In this work, we address the limitations of reconstruction error-based autoencoders and propose a kernelized autoencoder that leverages a robust form of Mahalanobis distance (MD) to measure latent dimension correlation to effectively detect both near and far anomalies. This hybrid loss is aided by the principle of maximizing the mutual information gain between the latent dimension and the high-dimensional prior data space by maximizing the entropy of the latent space while preserving useful correlation information of the original data in the low-dimensional latent space. The multi-objective function has two goals -- it measures correlation information in the latent feature space in the form of robust MD distance and simultaneously tries to preserve useful correlation information from the original data space in the latent space by maximizing mutual information between the prior and latent space.
translated by 谷歌翻译